Learning and controlling the source-filter representation of speech with a variational autoencoder
نویسندگان
چکیده
Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming generating various types of data. In speech processing, inspiring from the anatomical mechanisms phonation, source-filter model considers that signals are produced few independent physically meaningful continuous factors, among which fundamental frequency $f_0$ formants primary importance. this work, we start variational autoencoder (VAE) trained an unsupervised manner on large dataset unlabeled natural signals, show production naturally arises as orthogonal subspaces VAE space. Using only seconds labeled generated with artificial synthesizer, propose method to identify encoding first three formant frequencies, these orthogonal, based orthogonality, develop accurately independently control factors within subspaces. Without requiring additional information such text or human-labeled data, results spectrograms conditioned applied transformation signals. Finally, also robust estimation exploits projection signal onto learned subspace associated $f_0$.
منابع مشابه
a comparative pragmatic analysis of the speech act of “disagreement” across english and persian
the speech act of disagreement has been one of the speech acts that has received the least attention in the field of pragmatics. this study investigates the ways power relations, social distance, formality of the context, gender, and language proficiency (for efl learners) influence disagreement and politeness strategies. the participants of the study were 200 male and female native persian s...
15 صفحه اولRepresentation Learning with Smooth Autoencoder
In this paper, we propose a novel autoencoder variant, smooth autoencoder (SmAE), to learn robust and discriminative feature representations. Different from conventional autoencoders which reconstruct each sample from its encoding, we use the encoding of each sample to reconstruct its local neighbors. In this way, the learned representations are consistent among local neighbors and robust to sm...
متن کاملthe analysis of the role of the speech acts theory in translating and dubbing hollywood films
از محوری ترین اثراتی که یک فیلم سینمایی ایجاد می کند دیالوگ هایی است که هنرپیش گان فیلم میگویند. به زعم یک فیلم ساز, یک شیوه متأثر نمودن مخاطب از اثر منظوره نیروی گفتارهای گوینده, مثل نیروی عاطفی, ترس آور, غم انگیز, هیجان انگیز و غیره, است. این مطالعه به بررسی این مسأله مبادرت کرده است که آیا نیروی فراگفتاری هنرپیش گان به مثابه ی اعمال گفتاری در پنج فیلم هالیوودی در نسخه های دوبله شده باز تولید...
15 صفحه اولPredicting Head Pose from Speech with a Conditional Variational Autoencoder
Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we, as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally cooccurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Seve...
متن کاملthe effects of speech rate,prosodic features, and blurred speech on iranian efl learners listening comprehension
کلید واژه ها به زبان انگلیسی: effect of speech rate on listening comprehension, blurred speech,segmental and suprasegmental features,authentic speech,intelligibility, discrimination, omission, assimilation چکیده: سرعت مطالب شنیداری در کلام پیوسته بطور کلی همواره کابوسی بوده برای یادگیرنده های زبان دوم و بالاخص برای شنوندگان ایرانی. علی رغم عقل سلیم که کلام با سرعت کندتری فعالیتهای درک مطلب شن...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Speech Communication
سال: 2023
ISSN: ['1872-7182', '0167-6393']
DOI: https://doi.org/10.1016/j.specom.2023.02.005